智能论文笔记

Differentiable Bilevel Programming for Stackelberg Congestion Games

Jiayang Li , Jing Yu , Qianni Wang , Boyi Liu , Zhaoran Wang , Yu Marco Nie

分类：人工智能

2022-09-15

一场堆放堡拥堵游戏（SCG）是一个双重计划，领导者的目标是通过预测和操纵均衡状态来最大程度地提高自己的收益，在该状态下，追随者通过玩拥堵游戏而定居。大规模的SCG以其顽固性和复杂性而闻名。这项研究通过可区分的编程来处理SCG，该编程将机器学习的最新发展与常规方法结合在一起。核心思想以模仿logit动力学形成的进化路径代表低级平衡问题。它可以在朝着平衡的演化路径上使用自动分化，从而导致双环梯度下降算法。我们进一步表明，对低级平衡的固定可能是一个自我强加的计算障碍。取而代之的是，领导者只能沿着追随者的演变路径向前看几个步骤，同时通过共同进化过程更新其决策。启示产生了一种单循环算法，该算法在记忆消耗和计算时间方面都更有效。通过涵盖广泛基准问题的数值实验，我们发现单循环算法始终达到解决方案质量和效率之间的良好平衡，不仅优于标准的双环实现，而且优于文献中的其他方法。重要的是，我们的结果既突出了“充分期待”的浪费和“零预期”的危险。如果需要快速启发术来解决一个非常大的SCG，则提议的单环算法具有一步的外观，使其成为理想的候选人。

translated by 谷歌翻译

Counterfactual Reasoning for Out-of-distribution Multimodal Sentiment Analysis

Teng Sun , Wenjie Wang , Liqiang Jing , Yiran Cui , Xuemeng Song , Liqiang Nie

分类：自然语言处理 | 人工智能

2022-07-24

关于多模式情感分析的现有研究在很大程度上依赖文本方式，不可避免地会引起文本单词和情感标签之间的虚假相关性。这极大地阻碍了模型的概括能力。为了解决这个问题，我们定义了分发（OOD）多模式分析的任务。该任务旨在估计和减轻文本方式对强大概括的不良影响。为此，我们接受了因果推断，该因果通过因果图检查了因果关系。从图中，我们发现虚假相关性归因于文本模式对模型预测的直接影响，而间接相关性通过考虑多模式语义来更可靠。受此启发的启发，我们设计了一个模型不足的反事实框架，用于多模式情感分析，该框架通过额外的文本模型捕获文本模式的直接效果，并通过多模型估算间接模型。在推断期间，我们首先通过反事实推断估算直接效应，然后从所有模式的总效应中减去它以获得可靠预测的间接效应。广泛的实验显示了我们提出的框架的卓越有效性和概括能力。

translated by 谷歌翻译

Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model

Xiaolin Chen , Xuemeng Song , Liqiang Jing , Shuo Li , Linmei Hu , Liqiang Nie

分类：自然语言处理

2022-07-16

多模式面向任务的对话框系统的文本响应生成旨在在给定多模式上下文的情况下生成适当的文本响应，这是一项必不可少但具有挑战性的任务。尽管现有的努力取得了令人信服的成功，但他们仍然遭受了两个关键的局限性：1）忽略生成预训练的好处，以及2）忽略与文本上下文相关的知识。为了解决这些局限性，我们为多模式的以任务为导向的对话框系统（DKMD）提出了一种新颖的双重知识增强的生成预验证的语言模型，由三个关键组成部分组成：双重知识选择，双重知识增强上下文上下文学习和知识增强的响应响应一代。具体来说，双重知识选择组件旨在根据给定上下文的文本和视觉方式选择相关的知识。此后，双重知识增强的上下文学习组件是从全球和局部观点上都无缝地将所选知识整合到多模式上下文的学习中，并探索了跨模式的语义关系。此外，知识增强的响应生成部分包括经过修订的Bart解码器，其中引入了其他点产品知识折线，以明确利用知识来推进文本响应生成。公共数据集的广泛实验验证了拟议的DKMD优于最先进的竞争对手。

translated by 谷歌翻译

EGFN: Efficient Geometry Feature Network for Fast Stereo 3D Object Detection

Aqi Gao , Yanwei Pang , Jing Nie , Jiale Cao , Yishun Guo

分类：计算机视觉

2021-11-28

快速的基于立体声的3D对象探测器最近在推理时间感到很大进展。然而，它们的精确度远远落后于高精度的方法。我们认为主要原因是快速立体声方法中缺失或差的3D几何特征表示。为了解决这个问题，我们提出了一个有效的几何特征生成网络（EGFN）。我们的EGFN的关键是一种有效且有效的3D几何特征表示（EGFR）模块。在EGFR模块中，首先生成轻量级成本体积特征，然后将其有效地转换为3D空间，并且最后进行图像和3D空间中的多尺度特征，以获得3D几何特征：增强的轻量级voxel特色。此外，我们介绍了一种新的多尺度知识蒸馏策略，以指导多尺度3D几何特征学习。公共基准测试集的实验结果表明，建议的EGFN优于Yolostsereo3D，先进的快速方法，在Map $ 5.16 \％上的$ _ {3d} $以仅需12毫秒的成本，因此实现了更好的权衡立体声3D对象检测的准确性和效率。我们的代码将公开提供。

translated by 谷歌翻译

On the Opportunities and Risks of Foundation Models

Rishi Bommasani , Drew A. Hudson , Ehsan Adeli , Russ Altman , Simran Arora , Sydney von Arx , Michael S. Bernstein , Jeannette Bohg , Antoine Bosselut , Emma Brunskill

分类：机器学习 | 人工智能

2021-08-16

AI正在经历范式转变，随着模型的兴起（例如Bert，Dall-E，GPT-3），这些模型经过大规模的数据训练，并且可以适应广泛的下游任务。我们称这些模型基础模型来强调其至关重要但不完整的特征。该报告提供了基础模型的机会和风险的详尽说明，包括其功能（例如语言，愿景，机器人技术，推理，人类互动）和技术原则（例如，模型架构，培训程序，数据，系统，安全，安全性，评估，理论）对其应用（例如法律，医疗保健，教育）和社会影响（例如不平等，滥用，经济和环境影响，法律和道德考虑）。尽管基础模型基于标准的深度学习和转移学习，但它们的规模导致了新的新兴能力，以及它们在许多任务中的有效性都激发了同质化。同质化提供了强大的杠杆作用，但要求谨慎，因为基础模型的缺陷均由下游的所有适应模型继承。尽管即将广泛地部署基础模型，但我们目前对它们的工作方式，失败以及由于其新兴属性的影响而缺乏清晰的了解。为了解决这些问题，我们认为基础模型的许多批判性研究都需要与他们的基本社会技术性质相称。

translated by 谷歌翻译

On Bilevel Optimization without Lower-level Strong Convexity

Lesi Chen , Jing Xu , Jingzhao Zhang

分类：人工智能 | 机器学习

2023-01-02

Theoretical properties of bilevel problems are well studied when the lower-level problem is strongly convex. In this work, we focus on bilevel optimization problems without the strong-convexity assumption. In these cases, we first show that the common local optimality measures such as KKT condition or regularization can lead to undesired consequences. Then, we aim to identify the mildest conditions that make bilevel problems tractable. We identify two classes of growth conditions on the lower-level objective that leads to continuity. Under these assumptions, we show that the local optimality of the bilevel problem can be defined via the Goldstein stationarity condition of the hyper-objective. We then propose the Inexact Gradient-Free Method (IGFM) to solve the bilevel problem, using an approximate zeroth order oracle that is of independent interest. Our non-asymptotic analysis demonstrates that the proposed method can find a $(\delta, \varepsilon)$ Goldstein stationary point for bilevel problems with a zeroth order oracle complexity that is polynomial in $d, 1/\delta$ and $1/\varepsilon$.

translated by 谷歌翻译

How would Stance Detection Techniques Evolve after the Launch of ChatGPT?

Bowen Zhang , Daijun Ding , Liwen Jing

分类：自然语言处理

2022-12-30

Stance detection refers to the task of extracting the standpoint (Favor, Against or Neither) towards a target in given texts. Such research gains increasing attention with the proliferation of social media contents. The conventional framework of handling stance detection is converting it into text classification tasks. Deep learning models have already replaced rule-based models and traditional machine learning models in solving such problems. Current deep neural networks are facing two main challenges which are insufficient labeled data and information in social media posts and the unexplainable nature of deep learning models. A new pre-trained language model chatGPT was launched on Nov 30, 2022. For the stance detection tasks, our experiments show that ChatGPT can achieve SOTA or similar performance for commonly used datasets including SemEval-2016 and P-Stance. At the same time, ChatGPT can provide explanation for its own prediction, which is beyond the capability of any existing model. The explanations for the cases it cannot provide classification results are especially useful. ChatGPT has the potential to be the best AI model for stance detection tasks in NLP, or at least change the research paradigm of this field. ChatGPT also opens up the possibility of building explanatory AI for stance detection.

translated by 谷歌翻译

TextBox 2.0: A Text Generation Library with Pre-trained Language Models

Tianyi Tang , Junyi Li , Zhipeng Chen , Yiwen Hu , Zhuohao Yu , Wenxun Dai , Zican Dong , Xiaoxue Cheng , Yuhao Wang , Wayne Xin Zhao

分类：自然语言处理

2022-12-26

To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets and further incorporates $45$ PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.

translated by 谷歌翻译

TriPINet: Tripartite Progressive Integration Network for Image Manipulation Localization

Wei-Yun Liang , Jing Xu , Xiao Jin

分类：计算机视觉

2022-12-25

Image manipulation localization aims at distinguishing forged regions from the whole test image. Although many outstanding prior arts have been proposed for this task, there are still two issues that need to be further studied: 1) how to fuse diverse types of features with forgery clues; 2) how to progressively integrate multistage features for better localization performance. In this paper, we propose a tripartite progressive integration network (TriPINet) for end-to-end image manipulation localization. First, we extract both visual perception information, e.g., RGB input images, and visual imperceptible features, e.g., frequency and noise traces for forensic feature learning. Second, we develop a guided cross-modality dual-attention (gCMDA) module to fuse different types of forged clues. Third, we design a set of progressive integration squeeze-and-excitation (PI-SE) modules to improve localization performance by appropriately incorporating multiscale features in the decoder. Extensive experiments are conducted to compare our method with state-of-the-art image forensics approaches. The proposed TriPINet obtains competitive results on several benchmark datasets.

translated by 谷歌翻译

Multi-queue Momentum Contrast for Microvideo-Product Retrieval

Yali Du , Yinwei Wei , Wei Ji , Fan Liu , Xin Luo , Liqiang Nie

分类：计算机视觉

2022-12-22

The booming development and huge market of micro-videos bring new e-commerce channels for merchants. Currently, more micro-video publishers prefer to embed relevant ads into their micro-videos, which not only provides them with business income but helps the audiences to discover their interesting products. However, due to the micro-video recording by unprofessional equipment, involving various topics and including multiple modalities, it is challenging to locate the products related to micro-videos efficiently, appropriately, and accurately. We formulate the microvideo-product retrieval task, which is the first attempt to explore the retrieval between the multi-modal and multi-modal instances. A novel approach named Multi-Queue Momentum Contrast (MQMC) network is proposed for bidirectional retrieval, consisting of the uni-modal feature and multi-modal instance representation learning. Moreover, a discriminative selection strategy with a multi-queue is used to distinguish the importance of different negatives based on their categories. We collect two large-scale microvideo-product datasets (MVS and MVS-large) for evaluation and manually construct the hierarchical category ontology, which covers sundry products in daily life. Extensive experiments show that MQMC outperforms the state-of-the-art baselines. Our replication package (including code, dataset, etc.) is publicly available at https://github.com/duyali2000/MQMC.

translated by 谷歌翻译